JJE: Applying Graph Flow Theory to Wikipedia
نویسندگان
چکیده
Data in the world, and more specifically on the Internet is growing to massive sizes. In order to make this information more useful, it must first be more accessible. The INEX Initiative competition is aimed with the goal of identifying and comparing methodologies for categorizing information into clusters. The competition will be run on 60 gigabytes of data from Wikipedia, with the ultimate goal of accurately categorizing and clustering in order to reduce search time through this data. In this work, we construct a clustering algorithm based of the link structure of a subset of underlying pages. The resulting webgraph is pruned using a max flow min cut algorithm[10, 2, 8] which is initially seeded using different heuristics. We compare search space reduction results and construct a visualization of the clustered documents. We were able to generate clusters on the INEX data set as well as visualization of clustered data on several different datasets.
منابع مشابه
Analysis of Resting-State fMRI Topological Graph Theory Properties in Methamphetamine Drug Users Applying Box-Counting Fractal Dimension
Introduction: Graph theoretical analysis of functional Magnetic Resonance Imaging (fMRI) data has provided new measures of mapping human brain in vivo. Of all methods to measure the functional connectivity between regions, Linear Correlation (LC) calculation of activity time series of the brain regions as a linear measure is considered the most ubiquitous one. The strength of the dependence obl...
متن کاملWikipedia graph mining: dynamic structure of collective memory
ABSTRACT Wikipedia is the biggest ever created encyclopedia and the fifth most visited website in the world. Tens of millions of people surf it every day, seeking answers to various questions. Collective user activity on the pages leaves publicly available footprints of human behavior, making Wikipedia a great source of the data for largescale analysis of collective dynamical patterns. The dyna...
متن کاملارزیابی پیوستگی اکولوژیک لکههای سبز شهری با استفاده از تئوری گراف،مطالعه موردی کلانشهر اهواز
Connectivity of urban green patches is an important structural attribute of urban landscape that facilitates the species movement and transfer of their genes among their habitats. So far, several methods including Graph Theory have been applied to assess ecological connectivity. This research was aimed to study the application of graph theory to measure the connectivity of green patches in the...
متن کاملExtracting Semantic Information from Wikipedia Using Human Computation and Dimensionality Reduction
Semantic background knowledge is crucial for many intelligent applications. A classical way to represent such knowledge is through semantic networks. Wikipedia’s hyperlink graph can be considered a primitive semantic network, since the links it contains usually correspond to semantic relationships between the articles they connect. However, Wikipedia is rather noisy in this function. We propose...
متن کاملNotes on NP Completeness
Here are some notes which I wrote to try to understand what NP completeness means. Most of these notes are taken from Appendix B in Douglas West’s graph theory book, and also from wikipedia. There’s nothing remotely original about these notes. I just wanted all this material to filter through my brain and onto paper. I also wanted to collect everything together in a way I like. Here’s what thes...
متن کامل